Boost-R: Gradient boosted trees for recurrence data
نویسندگان
چکیده
Recurrence data arise from multi-disciplinary domains spanning reliability, cyber security, healthcare, online retailing, etc. This paper investigates an additive-tree-based approach, known as Boost-R (Boosting for Data), recurrent event with both static and dynamic features. constructs ensemble of gradient boosted additive trees to estimate the cumulative intensity function process, where a new tree is added by minimizing regularized L2 distance between observed predicted intensity. Unlike conventional regression trees, time-dependent constructed on each leaf. The sum these functions, multiple yields estimator divide-and-conquer nature tree-based methods appealing when hidden sub-populations exist within heterogeneous population. non-parametric helps avoid parametric assumptions complex interactions processes Critical insights advantages are investigated through comprehensive numerical examples. Datasets computer code made available GitHub. To our best knowledge, first approach modeling large-scale feature information.
منابع مشابه
Finding Influential Training Samples for Gradient Boosted Decision Trees
We address the problem of finding influential training samples for a particular case of tree ensemble-based models, e.g., Random Forest (RF) or Gradient Boosted Decision Trees (GBDT). A natural way of formalizing this problem is studying how the model’s predictions change upon leave-one-out retraining, leaving out each individual training sample. Recent work has shown that, for parametric model...
متن کاملGradient Boosted Decision Trees for High Dimensional Sparse Output
In this paper, we study the gradient boosted decision trees (GBDT) when the output space is high dimensional and sparse. For example, in multilabel classification, the output space is a L-dimensional 0/1 vector, where L is number of labels that can grow to millions and beyond in many modern applications. We show that vanilla GBDT can easily run out of memory or encounter near-forever running ti...
متن کاملGB-CENT: Gradient Boosted Categorical Embedding and Numerical Trees
Latent factor models and decision tree based models are widely used in tasks of prediction, ranking and recommendation. Latent factor models have the advantage of interpreting categorical features by a low-dimensional representation, while such an interpretation does not naturally fit numerical features. In contrast, decision tree based models enjoy the advantage of capturing the nonlinear inte...
متن کاملWeb-Search Ranking with Initialized Gradient Boosted Regression Trees
In May 2010 Yahoo! Inc. hosted the Learning to Rank Challenge. This paper summarizes the approach by the highly placed team Washington University in St. Louis. We investigate Random Forests (RF) as a low-cost alternative algorithm to Gradient Boosted Regression Trees (GBRT) (the de facto standard of web-search ranking). We demonstrate that it yields surprisingly accurate ranking results — compa...
متن کاملOptimization with Gradient-Boosted Trees and Risk Control
Decision trees effectively represent the sparse, high dimensional and noisy nature of chemical data from experiments. Having learned a function from this data, we may want to thereafter optimize the function, e.g., picking the best chemical process catalyst. In this way, we may repurpose legacy predictive models. This work studies a large-scale, industrially-relevant mixed-integer quadratic opt...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Quality Technology
سال: 2021
ISSN: ['2575-6230', '0022-4065']
DOI: https://doi.org/10.1080/00224065.2021.1948373